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ABSTRACT The genome sequences of intestinal Bacteroidales strains reveal evidence of extensive horizontal gene transfer. In 
vitro studies of Bacteroides and other bacteria have addressed mechanisms of conjugative transfer and some phenotypic out- 
comes of these DNA acquisitions in the recipient, such as the acquisition of antibiotic resistance. However, few studies have ad- 
dressed the horizontal transfer of genetic elements between bacterial species coresident in natural microbial communities, espe- 
cially microbial ecosystems of humans. Here, we examine the genomes of Bacteroidales species from two human adults to 
identify genetic elements that were likely transferred among these Bacteroidales while they were coresident in the intestine. Us- 
ing seven coresident Bacteroidales species from one individual and eight from another, we identified five large chromosomal 
regions, each present in a minimum of three of the coresident strains at near 100% DNA identity. These five regions are not 
found in any other sequenced Bacteroidetes genome at this level of identity and are likely all integrative conjugative elements 
(ICEs). Such highly similar and unique regions occur in only 0.4% of phylogenetically representative mock communities, pro- 
viding strong evidence that these five regions were transferred between coresident strains in these subjects. In addition to the 
requisite proteins necessary for transfer, these elements encode proteins predicted to increase fitness, including orphan DNA 
methylases that may alter gene expression, fimbriae synthesis proteins that may facilitate attachment and the utilization of new 
substrates, putative secreted antimicrobial molecules, and a predicted type VI secretion system (T6SS), which may confer a com- 
petitive ecological advantage to these strains in their complex microbial ecosystem. 

IMPORTANCE By analyzing Bacteroidales strains coresident in the gut microbiota of two human adults, we provide strong evi- 
dence for extensive interspecies and interfamily transfer of integrative conjugative elements within the intestinal microbiota of 
individual humans. In the recipient strain, we show that the conjugative elements themselves can be modified by the transposi- 
tion of insertion sequences and retroelements from the recipient's genome, with subsequent transfer of these modified elements 
to other members of the microbiota. These data suggest that the genomes of our gut bacteria are substantially modified by other, 
coresident members of the ecosystem, resulting in highly personalized Bacteroidales strains likely unique to that individual. The 
genetic content of these ICEs suggests that their transfer from successful adapted members of an ecosystem confers beneficial 
properties to the recipient, increasing its fitness and allowing it to better compete within its particular personalized gut micro- 
bial ecosystem. 
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The human intestine harbors a very dense microbial ecosystem 
containing approximately 1 0 1 1 to 1 0 12 bacteria per g of colonic 
content. The species within this community are diverse; however, 
most of the numerically dominant species are contained within 
two bacterial taxonomic groups, the Gram-positive phylum Fir- 
micutes and the Gram-negative order Bacteroidales (1, 2). There 
are more than 25 different human gut Bacteroidales species, many 
colonizing this ecosystem simultaneously at high density (3, 4). 
Coresident gut Bacteroidales form ecological networks to utilize 
dietary polysaccharides (5), with mutualistic interactions likely 
occurring between these members. Therefore, the presence in the 
human intestinal microbiota of different Bacteroidales species/ 



strains, each with different phenotypes and fitness properties, may 
increase the fitness of the Bacteroidales community as a whole. 

Many important molecules of the gut Bacteroidales, such as 
those involved in microbial interactions with the host, other mi- 
crobes, and dietary or abiotic substances, are not encoded by con- 
served genes of a species. These include the immunomodulatory 
polysaccharide molecule PSA of Bacteroides fragilis strain NCTC 
9343, the genes for which are contained in less than one-third of 
B. fragilis strains (6), the B. fragilis enterotoxin (7) implicated in 
colon cancer (8), glycoside hydrolases and polysaccharide lyases 
(5) that allow these bacteria to harvest dietary and host glycans (9, 
10), and secreted antimicrobial molecules (M. Chatzidaki- 
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TABLE 1 Composition of natural Bacteroidales communities and identification of highly similar regions in strains coresident in a gut microbial 
ecosystem 

vt r rpr ,„ mp CL02 region (size [bp]) 4 : CL03 region (size [bp] ) h : 
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a All species belong to Bacteroides or Parabacteroides. 

b J ', the region is present in the organism; +/— , a large, yet partial segment of the region was identified at >99.9%. 
c Type(s) of CRISPR/Cas systems present in the organism. 



Livanis, M. Coyne, and L. Comstock, submitted for publication) 
predicted to limit local competition. 

Many genes contributing to strain diversity are contained in 
regions likely acquired by horizontal gene transfer (HGT). The 
genomes of gut Bacteroidales strains show evidence of DNA acqui- 
sitions from phage (11), conjugative plasmids (12-14), and con- 
jugative transposons (15, 16). In Bacteroides species, conjugative 
plasmids and conjugative transposons have been studied intensely 
for decades because of the importance of these mobile elements in 
transferring antibiotic resistance genes (12-14, 17, 18). Bacteroi- 
dales conjugative transposons fall within the classification of inte- 
grative and conjugative elements (ICEs), and as such, they encode 
the gene products necessary for conjugative transfer, including the 
mating apparatus, integrases, excisionases, and proteins that reg- 
ulate transfer (reviewed in references 18 and 19). In order for 
conjugative transfer to occur, an ICE must excise from the chro- 
mosome and form a nonreplicative covalently closed circular in- 
termediate. It is thought that a single strand of the element is then 
transferred through a mating apparatus to the recipient, with the 
single strands in both the donor and recipient then being repli- 
cated and the element subsequently being (re)integrated into the 
donor and recipient genomes. Due to the number of genes neces- 
sary for these processes, conjugative transposons are relatively 
large, with those described in Bacteroides averaging approximately 
50 to 80 kb (18). 

As mating aggregates are necessary for the transfer of conjuga- 
tive elements, these processes should be favored in dense micro- 
bial ecosystems. The human gut is an ideal environment for such 
conjugative transfers due to its high density of related Bacteroi- 
dales species. Most studies of the transfer of mobile genetic ele- 
ments (MGEs) of gut bacteria have been performed in vitro or 
with experimental in vivo systems (20-22). Data regarding trans- 
fer within the natural human gut ecosystem are lacking, especially 
regarding the extent of transfer that occurs within an individual 
human's microbiota. One study provided strong evidence for the 
transfer of an 8.9-kb conjugative plasmid among four coresident 



Bacteroidales species in the gut microbiota of a human girl (23). 
This small plasmid contained genes and elements necessary for 
replication and mobilization, such as repA, mobA, mobB, and oriT, 
but not genes required for the mating apparatus. Due to the im- 
portance of MGEs in supplying closely related strains/species with 
genes that may allow them to rapidly adapt to an ecosystem (re- 
viewed in reference 24) and to understand the nature of these 
genetic transfers within an individual's microbiota and how these 
genomes are modified by interaction with other members of the 
ecosystem, we studied coresident Bacteroidales species for evi- 
dence of HGT. We provide evidence for the interspecies and in- 
terfamily transfer of large genetic elements within the gut micro- 
bial ecosystem of two healthy humans. We show that these MGEs 
meet the definition of ICEs or conjugative transposons and carry 
genes predicted to increase the fitness of the recipient. 

RESULTS 

Analysis of coresident Bacteroidales strains for evidence of in- 
traecosystem DNA transfer. Seven strains of different species co- 
colonizing subject CL02 and eight strains of different species co- 
colonizing subject CL03 were included in the analyses, with each 
community including both Bacteroides and Parabacteroides spe- 
cies (Table 1). Within the gut microbiota of each individual, these 
strains were each present at >10 8 CFU/g (3). The genomes com- 
prising each of these communities were compared to one another 
at the DNA level using BLAST. To identify DNA regions with the 
best likelihood of intraecosystem transfer, we limited the search to 
identify regions that existed in at least three of the Bacteroidales 
strains of an individual. Moreover, these segments were required 
to be at least 10 kb in length and have at least 99.9% DNA identity 
between strains. These criteria were intentionally conservative to 
avoid detecting small regions coincidentally common between 
strains without necessarily indicating recent transfer. Each of 
these 15 genomes were finished to the draft level, wherein a super- 
contig or scaffold is assembled by linking smaller contigs, often 
separated by long stretches of Ns representing unassigned or am- 
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FIG 1 Comparisons of regions 1 to 5 in the three or four genomes containing these MGEs. Differences between strains for each region following sequencing to 
resolve Ns are shown. The remaining SNPs displayed were not tested by sequencing and represent the original genome sequence for each isolate. The positions 
of IS and RE in regions 1 and 2 are shown with the corresponding sizes of these elements. 



biguous residues. As these Ns cause BLAST to split potentially 
contiguous hits into multiple returns, the BLAST files were parsed 
and the results were consolidated and counted as one region if 
there were gaps of < 5,000 bp or if the coordinates overlapped. 
These consolidations revealed six large regions of DNA, referred 
to herein as regions 1 through 6, two from the CL02 community 
and four from the CL03 community (Table 1). In general, each of 
the regions was nearly 100% identical between the identified 
strains, with the exception of a few single-nucleotide polymor- 
phisms (SNPs), insertion sequences (IS), and/or retroelements 
(RE) in some regions, as detailed below. 

Region 1 was detected in the CL02 community in Bacte- 
roides cellulosilyticus, Bacteroides salyersiae, and Bacteroides dorei. 
There were several areas where the sequences from these three 
genomes diverged and were not identified as contiguous aligning 
segments in our initial analyses, largely due to assembler- 
introduced Ns. We PCR amplified and sequenced all regions con- 
taining Ns (see Table SI in the supplemental material). These 
complete sequences revealed that regions 1 from B. cellulosilyticus 
and B. salyersiae are 100% identical over their entire 24,866-bp 
length (Fig. 1), whereas the B. dorei genome differed from the 
other two by a 12-bp insertion and 12-bp deletion and the pres- 
ence of IS and RE (Fig. 1). The B. cellulosilyticus and B. salyersiae 
genomes contain two IS, referred to here as ISa and ISb, which are 
absent in B. dorei, and B. dorei contains a different IS and an RE, 
referred to here as ISc and REa, both of which are absent in the 
other two genomes (Fig. 1 and 2). Details of these IS and RE are 



contained in Table S2. The patterns of these IS and RE suggest that 
this region initially lacked these elements and was modified by 
preexisting copies from the genome of a recipient/donor. In fact, 
each of the strains containing these IS and RE have, in most cases, 
numerous other copies of these IS and RE in other locations in 
their genome (Table S2). 

Region 2 is very large ( 1 1 6,095 bp) and is present in four of the 
seven isolates of the CL02 community, B. cellulosilyticus, B. dorei, 
B. salyersiae, and Parabacteroides johnsonii. Segments containing 
assembler-introduced Ns were PCR amplified and sequenced (see 
Table SI in the supplemental material). These data revealed that 
regions 2 are identical among these four strains except for an IS 
element (ISd), present only in P. johnsonii, and two RE, REb, pres- 
ent only in B. salyersiae, and REc, present in both B. salyersiae and 
B. dorei (Fig. 2; Table S2). 

The three regions from the CL03 community contained no 
assembler-introduced Ns and no IS element differences between 
strains. The first of these (region 3) is 17,607 bp and is present in 
CL03 community members Bacteroides uniformis, B. dorei, and 
Parabacteroides merdae at 100% identity (Fig. 1 and 2). 

Region 4 is 60,734 bp and is present in the genomes of CL03 
members B. fragilis, Bacteroides xylanisolvens, and Parabacte- 
roides distasonis. The sequences of these three regions agree per- 
fectly, with the exception of one SNP. The first 44,008 bp of this 
sequence was also present at 100% identity in the Bacteroides ova- 
tus CL03 genome, at the end of scaffold 1.10, and the remaining 
16,726 bp was found in the middle of scaffold 1.3. The disconti- 
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FIG 2 Open reading frame (ORF) maps of regions 1 to 5. Regions are oriented so that the majority of the tra genes (red) read left to right. The letter above the 
red genes indicates the particular tra gene. An open reading frame map, excluding variable IS and RE, is shown for each region, with the locations of IS and RE 
indicated. Genes encoding selective orthologous proteins present in each region are color coded as indicated above. Genes comprising the type VI secretion 
system (T6SS) of region 2 are shown (blue). The 24,866-bp region 1 (boxed) and the 17,607-bp region 3 (boxed) are extended to show the likely extent of the 
MGEs that were transferred between strains. 



nuity of region 4 in this strain may be the result of an error in the 
assembly of this genome sequence. 

Region 5 is 42,545 bp and is present in CL03 community mem- 
bers B. fragilis, B. xylanisolvens, and B. uniformis. The regions 5 are 
100% identical between the three genomes, with the exception of 
two SNPs at the very end ofthe region infi. uniformis (Fig. l).The 
second half of this region (28,967 bp) was also detected in the 
B. ovatus genome assembly, residing in the middle of scaffold 1.3. 

Region 6 is 44, 1 24 bp and is present in CL03 community mem- 



bers B. ovatus, B. xylanisolvens, and P. merdae. This region was not 
further analyzed due to its presence at 100% identity in numerous 
noncommunity members (see below). 

Presence of highly identical regions in other Bacteroidales 
strains. The possibility existed that these DNA segments repre- 
sented very promiscuous MGEs and that their presence in these 
isolates was coincidental and not related to the fact that they were 
coresident. If so, BLAST analysis of these regions against the da- 
tabase of all draft and completed Bacteroidetes genomes should 
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reveal other strains not present in these natural ecosystems that 
have similarly sized regions also identical at >99.9%. For each of 
these six regions, BLAST analyses were performed with each of the 
regions with all IS and RE removed to allow the best chance to 
return a similarly conserved region. The results of these BLAST 
analyses revealed that only one of the six regions had >99.9% 
identity to another >10-kb segment from other Bacteroidales 
strains not associated with these natural communities (Table 2). 
CL03 region 6, which is 44,124 bp in length, is present at 100% 
identity in numerous other Bacteroidales strains. In contrast, no 
other sequenced Bacteroidetes genomes contained regions of 
>10 kb that matched regions 1 to 5, even at 99.90% identity, 
whereas the identified regions in coresident strains are 99.99 to 
100% identical to each other, even prior to resolving the Ns 
(Table 2). These data provide strong evidence that regions 1 to 5 
were transferred between coresident strains of the CL02 or CL03 
ecosystems, but the BLAST data do not support the intraecosys- 
tem transfer of region 6. 

Analysis of highly similar regions within the genomes of 
mock communities of Bacteroidales. To estimate the frequency 
with which one might expect to find such long and nearly identical 
DNA segments (i.e., >10 kb and >99.9% identity in three strains) 
among bacteria that were not coresident, we performed a similar 
BLAST search using 1,000 eight-member mock communities of 
Bacteroidales assembled from a set of 84 Bacteroides and Parabac- 
teroides genome sequences of similar quality (see Materials and 
Methods; see Table S3 in the supplemental material). Genomes 
were pseudorandomly assigned to each mock community such 
that no collection contained two genomes of the same species and 
each microbiota contained at least one but not more than two 
Parabacteroides genomes. Each collection was further restrained 
by limiting it to contain no more than one genome of each of the 
CL02, CL03, and CL09 strains, as these groups each represent 
strains collected from three different subjects (3). 

The mock-community BLAST analysis revealed only three 
unique segments of qualifying DNA that were &10 kb, >99.9% 
identical, and shared by 3 strains within a mock community but 
not by any other genomes in the BLAST comparison database 
(Table 3; see Table S4 in the supplemental material). The first of 
these regions is 12,502 bp and is contained in the same three Bac- 
teroides strains that were present in both mock community 59 and 
mock community 609, the second is 13,248 bp and is present in 
two Bacteroides and a P. merdae genome of one mock community, 
and the third region is 30,598 bp and was contained in three Bac- 
teroides genomes from one mock community. 

Therefore, in the two natural communities CL02 and CL03, 
five unique qualifying regions were retrieved with no other 
matches in the database at 99.9% or greater (mean of 2.5 regions 
per community), whereas only four such regions (including one 
unique region found in two different communities) were retrieved 
from similar analyses of 1,000 communities of non-coresident 
strains (mean of 0.004 regions per community). Moreover, many 
of the qualifying DNA segments detected in the real communities 
were larger than the segments detected in the mock communities. 
Therefore, the likelihood of detecting such highly similar and 
unique regions in a set of Bacteroidales strains that are coresident is 
625 times higher than the likelihood of detecting such a region 
among non-coresident strains, providing strong evidence that the 
five identified regions from the CL02 and CL03 ecosystems were 



transferred between strains while coresident in the gut microbiota 
of these humans. 

Genetic content of the five transferred regions. Conjugative 
transposons or ICEs contain genes encoding all the functions for 
their transfer, including the machinery for the conjugative mating 
apparatus, which in Gram-negative bacteria largely occurs by type 
IV secretion systems (T4SS) (19). Regions 2, 4, and 5 each contain 
numerous genes encoding Tra proteins of T4SS machinery, in- 
cluding TraD, -G, -J, -K, -L, -M, and -N. These tra genes from each 
region have a similar genetic architecture, displaying a modular 
unit of functionally related genes, characteristic of ICEs (19). Re- 
gions 1 and 3 are likely contained on larger MGEs but were trun- 
cated in our analyses due to assembly scaffold breaks in at least one 
of the three qualifying genomes. For region 3, the scaffold from 
P. merdae extended beyond the defined region, and several smaller 
scaffolds from both the B. uniformis and B. dorei genomes aligned 
at 100% identity with the larger P. merdae sequence with relatively 
small gaps or overlaps, indicating that the true size of the trans- 
ferred element is likely -47 kb (Fig. 3). All of the same tra genes 
were contained in this extended region (Fig. 2), suggesting that 
this MGE is also an ICE. Region 1 also continued upstream for an 
additional 61.5 kb at near 100% identity in two of three genomes 
(Fig. 3). Alignment of this extended region with the B. cellulosil- 
yticus sequence indicated that the genome was likely misas- 
sembled in this area. However, for the two genomes that contin- 
ued, the same tra genes were identified (Fig. 2). Therefore, three of 
the five identified regions meet the definition of an ICE, with re- 
gions 1 and 3 also likely part of a larger ICE that was truncated in 
our analysis due to incomplete or incorrect assembly of the ge- 
nome sequences. 

These ICEs also contained other common genes, such as those 
encoding single-stranded-DNA-binding proteins, relaxases, 
ParBs, excisionases, TOPRIM-like proteins, ATPases similar to 
those involved in chromosomal partitioning, and proteins with 
DUF4 1 33 , DUF4 1 34, and DUF4099 (Fig. 2, Table 4) . Each of these 
regions also contains at least one gene with predicted site-specific 
recombinase activity, likely involved in integration of the element 
(Fig. 2, Table 4). 

As ICEs must excise from the donor genome in order to trans- 
fer to a recipient, some encode a toxin-antitoxin pair to ensure 
that they are not lost in the donor strain prior to replication and 
reintegration (25). Regions 1 to 5 each encode identifiable toxin- 
antitoxin or immunity proteins, likely for element maintenance 
(Table 4; see Table S5 in the supplemental material). In addition, 
each of these five regions encodes a predicted antirestriction pro- 
tein, frequently contained on a conjugative element, which facili- 
tates maintenance of the ICE in the recipient prior to its modifi- 
cation. 

Genes that may contribute to fitness. Each region also con- 
tains numerous genes unrelated to transfer and maintenance of 
the ICE. The majority of these genes encode hypothetical proteins 
of unknown function (see Table S5 in the supplemental material); 
however, many encode products with putative functions that sug- 
gest that they could contribute to fitness. Region 1 encodes genes 
likely involved in fimbria synthesis. Similar FimA orthologs in the 
oral Bacteroidales species Porphyromonas gingivalis allow this or- 
ganism to attach to host cells (reviewed in reference 26). In these 
gut Bacteroidales, these fimbriae may expand the niche of these 
organisms, allowing them to attach to other host, microbial, or 
dietary particle surfaces in the gut. 
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TABLE 2 BLAST output of regions 1 to 6 against the database" 
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1 


62,451 


116,095 


178,109 


231,758 


100.00 


59,402 


2 


0 


1 


59,042 


290,506 


231,465 


99.99 


13,339 


1 


0 


59,002 


72,340 


231,364 


218,026 


100.00 


12,303 


0 


0 


75,233 


87,535 


214,881 


202,579 


100.00 


28,560 


1 


0 


87,536 


116,095 


202,478 


173,919 


99.55 


58,086 


263 


27 


58,035 


116,095 


5,697 


63,752 


98.71 


33,023 


426 


41 


2 


33,010 


2,030,884 


2,063,856 


97.79 


24,597 


543 


9 


34,628 


59,221 


2,065,264 


2,089,854 


100.00 


17,607 


0 


0 


1 


17,607 


96 


17,702 


100.00 


17,607 


0 


0 


1 


17,607 


142,174 


159,780 


100.00 


17,607 


0 


0 


1 


17,607 


17,607 


1 


98.53 


17,614 


245 


7 


1 


17,607 


30,388 


47,994 


98.48 


17,615 


250 


13 


2 


17,607 


30,516 


48,121 


98.50 


16,772 


236 


12 


2 


16,766 


17,245 


34,007 


100.00 


60,734 


0 


0 


1 


60,734 


285,831 


346,564 


100.00 


60,734 


0 


0 


1 


60,734 


2,432,090 


2,492,823 


100.00 


60,734 


2 


0 


1 


60,734 


2,000,696 


1,939,963 


100.00 


44,008 


2 


0 


1 


44,008 


31,399 


75,406 


100.00 


16,726 


0 


0 


44,008 


60,733 


215,190 


231,915 


99.20 


38,365 


289 


17 


22,378 


60,733 


2,040,415 


2,078,771 


99.63 


15,410 


55 


2 


1,801 


17,209 


2,017,133 


2,032,541 


99.60 


15,423 


58 


3 


1,801 


17,221 


230,238 


245,659 


95.50 


16,906 


652 


53 


30,502 


47,366 


215,548 


198,710 



BLAST target 6 



Accession no. 



Query — CL02 region 1 
B. salyersiae CL02T12C01 
B. cellulosilyticus CL02T12C19 
B. dorei CL02T12C06 

B. eggerthii DSM 20697 
B.plebeius DSM 17135 
B. fragilis 3_1_12 

Query — CL02 region 2 
B. dorei CL02T12C06 
B. salyersiae CL02T12C01 
P. johnsonii CL02T12C29 

B. cellulosilyticus CL02T12C19 



B. ovatus CL02T12C04 
Bacteroides sp. strain 3_2_5 



Query — CL03 region 3 

B. uniformis CL03T12C37 
P. merdae CL03T12C32 
B. dorei CL03T12C01 

B. eggerthii 1_2_48FAA 
B.plebeius DSM 17135 
B. intestinalis DSM 17393 

Query — CL03 region 4 
B. fragilis CL03T12C07 
P. distasonis CL03T12C09 
B. xylanisolvens CL03T12C04 
B. ovatus CL03T12C18 

B. fragilis NCTC 9343 

B. helcogenes P 36-108 
B. uniformis ATCC 8492 

Query — CL03 region 5 

B. xylanisolvens CL03T12C04 
B. fragilis CL03T12C07 
B. uniformis CL03T12C37 
B. ovatus CL03T12C18 
Bacteroides sp. strain 3_1_23 
B.finegoldii DSM 17565 
B. salyersiae DSM 18765 

Query — CL03 region 6 



NZJH724307.1 
NZJH724088.1 
NZJH724135.1 
NZ_DS995509.1 
NZ_DS990131.1 
NZ_EQ973213.1 



NZJH724134.1 
NZ_JH724309.1 
NZJH976468.1 
NZJH976468.1 
NZ_JH724088.1 
NZ_JH724088.1 
NZJH724088.1 
NZJH724088.1 
NZJH724231.1 
NZJH636044.1 
NZJH636044.1 



NZJH724271.1 

NZJH976456.1 

NZJH724164.1 

NZ_AKBX01000010.1 

NZ_DS990120.2 

NZ_ABJL02000003.1 



NZJH724182.1 

NZJH976495.1 

NZJH724294.1 

NZJH724250.1 

NZJH724243.1 

NC_003228.3 

NC_003228.3 

NC_014933.1 

NZ_DS362247.1 



100.00 


42,545 


0 


0 


1 


42,545 


1,171,697 


1,214,241 


NZJH724294.1 


100.00 


42,545 


0 


0 


1 


42,545 


457,382 


414,838 


NZJH724184.1 


100.00 


42,545 


2 


0 


1 


42,545 


725,544 


768,088 


NZJH724268.1 


100.00 


28,967 


1 


0 


13,578 


42,544 


205,601 


176,635 


NZJH724243.1 


96.50 


18,468 


561 


50 


16,314 


34,740 


2,449,865 


2,431,442 


NZ_GG774949.1 


96.60 


17,611 


497 


57 


17,192 


34,740 


29,060 


46,630 


NZ_GG688325.1 


97.03 


16,978 


442 


35 


17,790 


34,740 


554,600 


537,659 


NZ_KB905466.1 



B. xylanisolvens CL03T12C04 


100.00 


44,124 


0 


0 


1 


44,124 


388,361 


344,238 


NZJH724296.1 


P. merdae CL03T12C32 


100.00 


26,817 


0 


0 


1 


26,817 


204,214 


231,030 


NZJH976457.1 




100.00 


16,701 


0 


0 


27,424 


44,124 


236,576 


253,276 


NZJH976457.1 


B. ovatus CL03T12C18 


100.00 


12,583 


0 


0 


1 


12,583 


530,345 


542,927 


NZJH724241.1 




100.00 


23,711 


1 


0 


12,584 


36,294 


545,391 


569,101 


NZ_JH724241.1 


B. eggerthii DSM 20697 


100.00 


44,124 


0 


0 


1 


44,124 


159,910 


204,033 


NZ_DS995511.1 


P. merdae CL09T00C40 


100.00 


44,124 


0 


0 


1 


44,124 


372,514 


416,637 


NZJH976526.1 


Bacteroides sp. strain 3_1_19 


100.00 


44,124 


0 


0 


1 


44,124 


180,923 


225,046 


NZ_GG774763.1 


Bacteroides sp. strain D22 


100.00 


44,124 


0 


0 


1 


44,124 


56,078 


100,201 


NZ_GG774819.1 


Alistipes sp. strain HGB5 


100.00 


44,124 


0 


0 


1 


44,124 


66,384 


110,507 


NZ_AENZ01000040.1 


Alistipes onderdonkii DSM 19147 


100.00 


44,124 


1 


0 


1 


44,124 


55,386 


11,263 


NZ_KB894552.1 


B. intestinalis DSM 17393 


100.00 


44,124 


0 


1 


1 


44,124 


456,857 


412,735 


NZ_ABJL02000006.1 


B. stercoris ATCC 43183 


100.00 


44,124 


2 


0 


1 


44,124 


103,168 


59,045 


NZ_DS499672.1 


P. merdae ATCC 43184 


100.00 


44,124 


1 


1 


1 


44,124 


73,390 


117,512 


NZ_DS264518.1 


B. fragilis YCH46 DNA 


100.00 


44,124 


1 


1 


1 


44,124 


163,822 


119,700 


NC_006347.1 



a All variant IS and RE were removed from query sequences. Boldface indicates strains from a natural ecosystem. 
h All species belong to Bacteroides or Parabacteroides, unless otherwise indicated. 
c % Identity was rounded to the closest hundredth of a percent. 
d MM, mismatches. 
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TABLE 3 BLAST output of three unique regions from the mock communities against the database" 



No. of: 



Query 



% Alignment 

BLAST query (accession no. :position), target'' Identity' length MM rf Gaps Start End 



Target 



Start 



End 



Accession no. 



Query — B. stercoris ATCC 43183 

(NZ_DS499676.1:176961-207558) 



B. stercoris ATCC 43183 


100.00 


30,598 


o 


o 




30,598 


176,961 


207,558 


B. vulgatus PC510 


99.96 


30,599 


10 


2 


1 


30,598 


30,597 


1 


B. uniformis ATCC 8492 


99.95 


30,602 


9 


3 


[ 


30,598 


176,746 


207,345 


B. cellulosilyticus CL02T12C19 


99.80 


13,767 


23 


3 




13,764 


624,314 


610,549 


B. vulgatus ATCC 8482 


99.66 


25,496 


7(i 


10 




25,491 


2,046,625 


2,021,136 


P. merdae ATCC 43184 


99.61 


25,500 


83 


12 




25,491 


117,639 


92,147 


Q uer y — B. fragilis HMW 616 


















(NZ JH815527. 1:1-13248) 


















B. fragilis HMW 616 


100.00 


13,248 


o 


o 


1 


13,248 


1 


13,248 




100.00 


13,248 


0 


0 


1 


13,248 


80,750 


67,503 


P. merdae ATCC 43184 


99.99 


13,248 


1 


0 


1 


13,248 


356,073 


342,826 


Bacteroides sp. strain 4_3_47FAA 


99.99 


13,248 


1 


0 


1 


13,248 


561,549 


548,302 


B. coprocola DSM 17136 


89.09 


8,440 


800 


87 


4,875 


13,248 


8,644 


17,028 


B. plebeius DSM 17135 


89.09 


8,440 


800 


87 


4,875 


13,248 


241,574 


249,958 


B. finegoldii CL09T03C10 


86.88 


5,349 


638 


46 


7,935 


13,248 


82,421 


77,102 


Query — B. faecis MAJ27 


















(NZ_AGDG01000049. 1 : 1-12502) 


















B. faecis MAJ27 


100.00 


12,502 


0 


0 


1 


12,502 


1 


12,502 


B. plebeius DSM 17135 


99.98 


12,502 


2 


0 


1 


12,502 


28,019 


15,518 


B. intestinalis DSM 17393 


99.98 


12,502 


2 


1 


1 


12,502 


14,748 


2,248 


Bacteroides sp. strain D22 


99.87 


12,502 


0 


4 


1 


12,502 


35,210 


22,725 


P. merdae CL03T12C32 


98.71 


8,731 


112 


1 


603 


9,333 


139,124 


130,395 


Bacteroides sp. strain 9_1_42FAA 


98.67 


9,241 


120 


1 


2,512 


11,752 


25,552 


34,789 


a Boldface indicates strains from a natural ecosystem. 
h All species belong to Bacteroides or Parabacteroides. 
c % Identity was rounded to the closest hundredth of a percent. 



NZ_DS499676.1 

NZ_ADKO0 1000036.1 

NZ_DS362245.1 

NZJH724089.1 

NC_009614.1 

NZ_DS264524.1 



NZJH8 15527.1 
NZJH8 15526.1 
NZ_DS264540.1 
NZJH114362.1 
NZ_DS981488.1 
NZ_DS990119.1 
NZJH951901.1 



NZ_AGDG01000049.1 

NZ_DS990120.2 

NZ_ABJL02000003.1 

NZ_GG774809.1 

NZJH976456.1 

NZ_EQ973 174.1 



d MM, mismatches. 



A. Region 1 



10.000 bp 



7-bp duplication 
I 



ISa 



ISa 



ISb 
V. 



B. cellulosilyticus CL02T12C19 
ISb 

2^ 



B. salyersiae CL02T12C01 



IScV 



bp del REa 



■bp ins 



B. dorei CL02T12C06 



B. Region 3 5,000 bp 
< 



ISe (3,858 bp) 



gap sizes: 473 337 Ns -16 

I 1 I 1 I 1 h 

scaffold name: 1.7 1.11 1.14 

scaffold size: 4,599 1,557 1,098 



scaffold name: 1.15 
scaffold size: 4,778 



1.19 1.20 
1,557 1,098 



1.6 

9,932 



1,788 473 
H I 1 h 

1.13 
1,269 



P. merdae CL03T12C32 supercontig 1.5, 116, 010. .162, 52(7 (46,518 bp) 

16 



1.9 
2,600 



1.13 
11,247 



1.4 
17,702 



B. uniformis 



1.12 
20,105 



H 



1.8 
2,930 
CL03T12C37 

-17 



1.18 

2,930 



B. dore/ CL03T12C01 



FIG 3 Likely extent of the MGEs containing regions 1 and 3. Boxed regions are the extent of regions 1 and 3 identified by the indicated BLAST criteria. (A) 
Expansion of region 1 in two of the three genomes. (B) Expansion of region 3 based on smaller matching scaffolds in each of the two genomes that are 
noncontiguous with the region from P. merdae. 
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TABLE 4 Numbers of various products encoded by the five intracommunity-transferred regions 



No. of products in: 







CL02 region: 


CL03 rej 


»ion: 


Putative category 


Putative assignment/function of gene products 


1 2 


3 


4 5 


Conjugative transfer machinery 


TrciO ( rmiTilifio rwn+pin 1 


I l 


I 


1 i 




TraG 




1 


1 1 




TraJ 




1 


1 1 




TraK 


1 1 


1 


1 1 




TraM 


1 1 


1 


1 1 




TraN 


1 1 


1 


1 1 




TraO 






1 1 


Recombinases 


Serine site-specific recombinases 

Tyrosine site-specific recombinases/integrases 


2 1 


1 


2 2 


Element transfer/partitioning/segregation 


THDDIX^ 1:1,. T~»TTC11QQ1 

t Ur KiM-llKe, JJUrjyyt 


1 i 


1 




1 OPRIM pnmase 
Excisionase 

Single-stranded-DNA-binding protein family 


1 
1 

3 2 




1 1 
1 




A I Pases — chromosome pai titioning/CobQ/CobB/MinD/ 


1 1 


1 


1 1 




ParA nucleotide binding 










rAlAlj aySLClll x^d-llj 1 all lily 


I i 








Chromosome segregation protein SMC 


1 1 








Relaxase/mobilization nuclease 




1 


1 1 


Other common proteins/domains 


RibD C-terminal domain, dinydroiolate reductase 






1 1 




fit TTMnQQ 


i i 


i 


i i 




T~\J TT? A 1 2 a 

UUr4133 


1 




1 1 




DUF4134 




1 


1 1 




DUF3408 




1 


1 1 




PH domain protein 


1 1 


1 




Transcriptional regulation/DNA binding 


RteC family 






1 1 


TetR family 




1 






Other transcriptional regulator 


2 




1 1 




Other helix-turn-helix domain DNA-binding proteins 


1 


1 


4 1 


Selfish genes/element survival 


Putative toxin 


1 1 


1 


2 1 




Putative antitoxin /immunity protein 


1 2 


1 


4 1 




Anti-restriction protein 


1 1 


1 


1 1 




DNA methylase 


1 3 






Potential fitness genes 


Fimbria synthesis 

MACPF domain containing 


2 


1 






M23 peptidase family 


2 3 


1 





Type VI secretion system (T6SS) S" 

" /, the region is present in the organism. 



Region 2 encodes three putative orphan DNA methyltrans- 
ferases not associated with a cognate restriction enzyme. DNA 
methyltransferases enable genomewide epigenetic modifications 
which have been shown to have diverse outcomes, including tran- 
scriptional regulation, cell cycle control, and regulation of conju- 
gal transfer (27, 28). Therefore, these newly acquired genes may 
have significant effects on recipient fitness. 

There are also genes in these regions that may contribute to 
competitive ecological interactions. Regions 2 and 3 contain a 
total of four predicted M23 peptidases (Table 4; see Table S5 in the 
supplemental material) that hydrolyze peptidoglycan and have 
various physiological functions, including bacteriocin activity 
(29). In addition, region 3 encodes a protein with a membrane 
attack/perforin (MACPF) domain found in proteins widely dis- 
tributed in Bacteroidetes species, one of which we have shown to 
have secreted antimicrobial activity targeting heterologous strains 
(M. Chatzidaki-Livanis et al., submitted for publication). 

The most notable feature of these regions is a large cluster of 
genes in region 2 encoding characteristic type VI secretion system 
(T6SS) proteins (Fig. 2 and Fig. 4; see Table S6 in the supplemental 



material) . Type VI secretion systems are widely distributed among 
Proteobacteria but have not previously been reported in Bacte- 
roidetes. T6SSs translocate toxic effector proteins into neighboring 
cells in a contact-dependent manner, killing sensitive cells (re- 
viewed in references 30 and 31). T6SS loci are very diverse, and 
certain hallmark T6SS proteins exhibit little pairwise identity in 
sequence-sequence comparison. Thus, the identification of these 
core proteins often relies on the presence of certain motifs 
(sequence-profile comparisons) or on remote homologies detect- 
able by profile-profile comparisons or structural similarities. 

Such profile-profile analyses (32, 33) reveal that this locus en- 
codes numerous proteins encoded by T6SS loci, including TssI 
(VgrG) and TssD (Hep), two proteins that comprise the T6SS 
cell-puncturing structure, the contractile sheath proteins TssB 
and TssC, the phage baseplatelike protein TssE, and the TssH 
(ClpV) ATPase, thought to be involved in recycling of TssB and 
TssC 

The locus also encodes proteins identified as TssF, TssG, and 
TssK, T6SS proteins whose function is less well understood, and a 
large transmembrane protein with both a GTP-ATP binding do- 
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B. cellulosilyticus CL02T12C19 



3928 

3933 3932 3931 3930 3929 3927 3926 3925 3924 3923 3922 3921 




HMPREF1062_03942 - HMPREF1062_03915 
3918 3917 3916 3915 



3k 4k 5k 6k 



Ok 11k 12k 13k 14k 15k 16k 17k 18k 19k 20k 21k 22k 23k 24k 25k 26k 27k 28k 29k 30k 31k 



3942 3940 3938 3937 3936 
3939 

| TssC | TssF I TssK ] Rhs family protein 

| TssB | TssH (CIpV) | TssD (Hep) ] immunity protein 

| | TssE \^\ TssG | Tssl (VgrG) [~~] TssM 

FIG 4 ORF map of portion of region 2, encoding a putative T6SS. Genes encoding proteins characteristic of or commonly associated with T6SS are color coded 
as indicated below. These designations are based on the analyses as outlined in Table S6 in the supplemental material. The putative functions of all gene products 
encoded by the genes shown here are included in Table S6. 



main and a P-loop ATPase domain, both of which are structural 
features of TssM, a protein involved in anchoring the T6SS appa- 
ratus to the cell wall. Additionally, this locus encodes an Rhs pro- 
tein with a deaminase domain and two putative immunity pro- 
teins, features that are also found associated with T6SS loci. As 
TssM (34), TssK (35), TssG, and TssF are associated with T6SS but 
not phage, it is unlikely that this region is an integrated phage. 
Although T6SS loci have been predicted to be transferred between 
strains by HGT, this is the first description of a putative T6SS locus 
likely being transferred on a conjugative element between strains 
within a natural human ecosystem. 

DISCUSSION 

By analyzing the genomes of Bacteroidales strains cocolonizing the 
guts of two humans, we provide evidence that as much as 140 kb of 
DNA has been exchanged within several strains in the microbiota 
of two individuals and suggest that ICE elements are likely respon- 
sible for this transfer. These transfers were not limited to Bacte- 
roides species; they also included Parabacteroides species. Bacte- 
roides are contained within the family Bacteroidaceae and 
Parabacteroides within the family Porphyromonadaceae, and as 
such, the Parabacteroides are more phylogenetically related to the 
oral pathogens Porphyromonas gingivalis and Tannerella forsythia 
than to the Bacteroides genus. However, the Parabacteroides have 
many phenotypes that are more in common with the Bacteroides 
than with the oral Porphyromonadaceae. A few notable pheno- 
types include the synthesis of multiple phase-variable capsular 
polysaccharides (36) and the production of the enzyme Fkp, 
which allows these bacteria to incorporate salvaged fucose from 
the gut environment into their glycans (37). The data from the 
current study reveal the tremendous capacity for species of these 
different families to share numerous phenotypes encoded by these 
ICEs. In fact, these data show that a Bacteroides strain and a Para- 
bacteroides strain living together in the same human gut share 
many features that are not shared with other, non-coresident 
members of the same genus/species. 

These genomic comparisons document the continued evolu- 
tion of these ICEs, which are subject to continued bombardment 
with IS and RE elements, likely from the recipient's genome. These 
modifications result in highly personalized genomes that are likely 
unique to each human. These data also reveal the extent to which 
our Bacteroidales strains are likely altered by the other members of 
our gut microbial community. 

In this retrospective study, we cannot determine which of these 



strains may have been the donor of the ICE and which the recip- 
ients. However, due to the presence of particular IS or RE in an 
ICE of one or two strains but not all, some predictions can be 
made. For example, ISa and ISb are each present in the exact same 
locations of region 1 for both B. cellulosilyticus and B. salyersiae, 
but both are absent in B. dorei ( Fig. 1 ) . Therefore, it is unlikely that 
B. dorei received this ICE from either B. cellulosilyticus or B. saly- 
ersiae. In addition, as both B. cellulosilyticus and B. salyersiae each 
contain other copies of both of these IS in their genomes, these 
elements were likely transferred from one member's chromo- 
somal copy to the ICE and then transferred to the other strain. In 
the recipient, the IS present on the ICE then could have served as 
the donor for transposition into other areas of its chromosome. 
The data clearly demonstrate that ICEs are efficient vehicles for the 
transfer of IS and RE between coresident strains (38). 

Although ICEs are selfish elements and contain numerous 
genes dedicated to their transmission and maintenance, the car- 
riage of fitness-conferring genes would increase the chance that 
the recipient of an ICE is maintained in the ecosystem. Indeed, 
elements transferred by HGT are known to encode fitness- 
conferring traits (24), the most obvious being genes encoding an- 
tibiotic resistance. In this way, HGT is a means to allow for rapid 
adaptation of new members into specific adapted communities 
(39). 

In analyzing the contents of these five genetic elements, we can 
speculate as to the influence on fitness of the transfer and acqui- 
sition of these ICEs. The predicted T6SS encoded by region 2 and 
the putative antimicrobial molecules encoded by regions 2 and 3 
are examples of transfers/acquisitions that may be advantageous 
to both the donor and recipient. The recipient is now endowed 
with machinery that may allow it to promote antagonistic inter- 
actions to limit competition, and the donor may benefit in that the 
recipient can now deploy this energetically costly defensive ma- 
chinery and share the burden of protecting the ecosystem from 
invasion. In Pseudomonas aeruginosa, a T6SS was shown to be 
assembled in response to mating pair formation by a T4SS of 
Escherichia coli, and therefore, it functions to prevent conjugal 
DNA transfer by killing the attempting donor strain (40). This 
response is postulated to block the acquisition of parasitic foreign 
DNA. It will be interesting to determine whether the Bacteroidales 
species that acquired the T6SS are now unable to receive addi- 
tional T4SS-mediated DNA transfers and, if so, whether it is an 
advantage or disadvantage for these strains in the human gut eco- 
system. 
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The identification of these intracommunity-transferred ICEs 
will allow for more in-depth analyses to address ecological inter- 
actions between these strains and other Bacteroidales strains of 
these natural communities that do not contain these elements. 
Because the majority of the genes on the five identified elements 
encode proteins of unknown function, there are potentially nu- 
merous advantages that these regions could confer to a recipient in 
its interactions with the host and other community members. As 
these strains represent the evolutionary winners at the time of 
their isolation, it is unlikely that these ICEs conferred an overall 
fitness disadvantage to the recipients. The isolation of additional 
Bacteroidales strains from these same subjects will allow us to de- 
termine whether strains containing these ICEs have been main- 
tained over time and/or whether the ICEs have since been trans- 
ferred to the remaining Bacteroidales members of these 
communities. 

MATERIALS AND METHODS 

Strains and genome sequences. The 15 CL02 and CL03 Bacteroidales 
strains of this study were isolated from human feces, as described previ- 
ously (3), as part of a study approved by the Partners Human Research 
Committee IRB that complied with all relevant federal guidelines and 
institutional policies. The genome sequencing of these strains was per- 
formed at the Broad Institute as part of the Human Microbiome Project 
(41). These sequences were deposited in GenBank and are identified by 
their project accession numbers, as follows: Bacteroides caccae, 
CL03T12C61 and PRJNA64801; B. cellulosilyticus, CL02T12C19 and 
PRINA64803; B. dorei, CL02T12C06 and PRINA64807; B. dorei, 
CL03T12C01 and PRINA64809; B. fragilis, CL03T12C07 and 
PRJNA64813; Bacteroides nordii, CL02T12C05 and PRINA64823; B. ova- 
tus, CL02T12C04 and PRINA64825; B. ovatus, CL03T12C18 and 
PRJNA64827; B. salyersiae, CL02T12C01 and PRJNA64829; B. uniformis, 
CL03T12C37 and PRTNA64835; B. xylanisolvens, CL03T12C04 and 
PRINA64839; P. distasonis, CL03T12C09 and PRJNA64883; Parabacte- 
roides goldsteinii, CL02T12C30 and PRJNA64887; P. johnsonii, 
CL02T12C29 and PRJNA64889; and P. merdae, CL03T12C32 and 
PRJNA64891. 

Intracommunity genome comparisons. The genomes comprising 
each of the mock or natural communities were compared to one another 
at the DNA level using BLAST. All hits of > 10,000 bp that shared >99.9% 
identity were retained, with redundancy due to reciprocal hits eliminated. 
The BLAST files were parsed to detect instances in which a particular 
query scaffold returned multiple qualifying segments (> 10 kb at >99.9% 
identity) against a particular target scaffold. These results were consoli- 
dated and counted as one qualifying hit if the gaps between the query 
sequence coordinates were <5,000 bp or if the query coordinates over- 
lapped. If the same segment of query DNA produced multiple qualifying 
returns from different scaffolds of the same target genome, this was also 
counted as one hit. 

Once consolidated, the BLAST results were further parsed for contig- 
uous query sequences producing qualifying matches against two or more 
target genomes within a community. The overlapping relationship be- 
tween these BLAST hits was analyzed to calculate the longest contiguous 
stretch of query DNA present in the target genomes under examination, 
and the query DNA thus defined was extracted from the proper scaffolds 
of the query genome. 

Analysis of segments found in the natural communities. Sequences 
flanking the > 10-kb, >99.9% identity segments present in three or more 
genomes of either of the two natural communities and that returned no 
qualifying hits from the comparison database were compared to identify 
areas where the sequences diverged. Once the ends of each region were 
established, the DNA sequences were recovered from all participating 
genomes and aligned using Clustal W2 (42). Areas where the multiple 
sequence alignment disagreed (for example, due to stretches of unaligned 



sequence from one or more genomes or from Ns inserted during genome 
sequence assembly, SNPs, etc.) were examined by PCR and/or sequencing 
(see Table SI in the supplemental material). The sequencing-corrected 
and/or PCR-confirmed DNA sequences were realigned, and several rela- 
tively short stretches of unaligned DNA present in a subset of the genomes 
due to the presence of IS or RE were removed. The sequences were trans- 
lated using Prodigal version 2.6 trained on the appropriate full genome 
(43). 

Selection of genomes for mock-community analysis. 156 genomes 
identified by NCBI as Bacteroides or Parabacteroides species were retrieved 
from the RefSeq repository. Genomes from species originating from non- 
human sources (e.g., Bacteroides salanitronis, acquired from a chicken 
cecum, or Bacteroides helcogenes, acquired from pig feces) were eliminated 
from the collection. Five duplicate genomes were also removed (B. dorei 
CL02T00C15, B. uniformis CL03T00C23, and B. fragilis strains 
CL03T00C08, CL05T00C42, and CL07TOOC01 each correspond to a 
CL0xT12Cxx strain isolated at a different time point from the same sub- 
ject). The genome sequences of the TOO and the T12 isolates are nearly 
identical, and including them would have introduced unnecessary dupli- 
cation. Individual databases prepared for each of the remaining genomes 
were queried via BLAST with a set of 16S ribosomal DNA sequences ac- 
quired from the Ribosomal Database Project (RDP), release 11.1 (44), 
representing the Bacteroides or Parabacteroides type strains. The highest- 
scoring segment pair resulting from each BLAST search was extracted 
from the target genome and examined further. Genomes with extracted 
segments of < 1,000 bp were excluded, and the remaining segments were 
used as queries against the RDP database to confirm the species assigned 
to the genome or assign a species designation to a genome annotated only 
to the genus level. Genomes whose species identification by this method 
was ambiguous or appeared incorrect were eliminated from the local col- 
lection. Ultimately, 84 genomes representing 26 Bacteroides and Parabac- 
teroides species were retained. 

Presence of DNA regions in noncommunity members. A collection 
of genomes was retrieved from NCBI to evaluate whether a qualifying 
DNA segment was unique to the community in which it was found. All 
DNAs contained in the RefSeq collection classified by NCBI as belonging 
to taxonomy ID 976 (phylum Bacteroidetes) that did not arise from met- 
agenomic or environmental samples and were not also members of tax- 
onomy ID 32644 (unclassified, e.g., unspecified or unidentified samples) 
were retrieved as FASTA files via the Web. This collection was further 
processed locally to remove entries whose sequences consisted entirely of 
rRNA genes and project info files. Scaffolds comprising genomes known 
to be duplicates were also removed. 

Each qualifying segment of DNA found to exist in three or more ge- 
nomes of a community was compared via BLAST to this comparison 
database. Only hits from outside the mock community were considered. 
The comparison database BLAST results were examined to enumerate the 
number of qualifying hits ( a 10 kb at >99.9% identity) returned. Multiple 
qualifying returns originating from the same target genome were scored as 
a single hit. 

Annotation of genes residing on regions 1 to 5. The utilities of the 
HMMER suite version 3.1bl (45) were compiled under Cygwin (version 
1.7.27; http://www.cygwin.com), and hmmpress was used to convert the 
Pfam-A data files (version 27) (46) to binaries. Each of the protein se- 
quences from the Prodigal-translated sequences was scanned under Cyg- 
win for matches to the Pfam-A set of motifs using hmmscan, with the 
sequence and domain E value cutoffs each set to 1.0. 

The position-specific score matrix (PSSM) files from NCBI's Con- 
served Domain Database (CDD, version 3.10) (47) were sorted by source 
database (Entrez models, SMART version 6.0, TIGRFAM version 13.0, 
COG and KOG, and LOAD). The PSSM files corresponding to NCBI's 
Protein Clusters database were further separated into curated prokaryotic 
and nonprokaryotic groups based on the naming convention of the PSSM 
files (48). Each of these groupings of PSSM files was compiled separately 
into RPS-BLAST databases using the NCBI makeprofiledb utility with 
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default settings. Protein sequences derived from the conserved sequences 
were scanned for conserved motifs using the NCBI rpsblast utility. The 
results of these motif scans and those of the Pfam-A scans were collected 
for each protein and used to inform the annotation (see Table S5 in the 
supplemental material). 

The segment encoding the predicted T6SS detected in region 2 was 
more extensively analyzed using the HHpred server (http:// 
toolkit.tuebingen.mpg.de/hhpred) (32). The use of HMM-HMM profile 
comparisons and comparisons to structured proteins contained in the 
Protein Data Bank (PDB; http://www.rcsb.org/pdb) (49) allowed the de- 
tection of remote homologs not detectable by sequence-sequence or 
sequence-profile analyses. 

SUPPLEMENTAL MATERIAL 
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